Count-Min Sketch
A random hashing-based method for frequent elements problem.
Solves point query problem: given any value
,
let
be the number of times
appears in the stream.
Goal: return estimate
such that
.
Solving Frequent items: return all items for which
.
(assume access to a Uniformly Random Hash
Function)
Count-Min Update:
- Choose random hash function
mapping to
- For
:
given item
,
set
return estimate
- always have
(this rightward summation term is error in frequency estimate)
Expected error is
Bound of probability of error
?
Use Markovβs Inequality:
Claim: for any
,
with probability at least
,
To solve point query with error
,
set
.
length
arrays Estimate
with
.
- for every
and
and
,
we know that with probability
,
#incomplete
See also: (Ξ΅, k)-Frequent Items Problem
References:
- G. Cormode and S. Muthukrishnan, βAn improved data stream summary:
the count-min sketch and its applications,β Journal of
Algorithms, vol. 55, no. 1, pp. 58β75, Apr. 2005, doi: 10.1016/j.jalgor.2003.12.001.
- https://www.chrismusco.com/amlds2023/notes/lecture01.html#Count-Min_Sketch
- https://www.chrismusco.com/amlds2023/lectures/lec1_annotated.pdf
^4e6aed